Recoding data
Problem
You want to recode data or calculate new data columns from existing ones.
Solution
Three methods of recoding data are shown below. The first method uses R's built-in commands, the second uses the cut
function, and the third uses the recode
function in the car
package.
The examples below will use this data:
data <- read.table(header=T, con <- textConnection(' subject sex control cond1 cond2 1 M 7.9 12.3 10.7 2 F 6.3 10.6 11.1 3 F 9.5 13.1 13.8 4 M 11.5 13.4 12.9 ')) close(con)
Recoding a categorical variable
Code Male as 1 and Female as 2, and put it in a new column.
data$scode[data$sex=="M"] <- "1" data$scode[data$sex=="F"] <- "2" # Convert the column to a factor data$scode <- factor(data$scode) # subject sex control cond1 cond2 scode # 1 M 7.9 12.3 10.7 1 # 2 F 6.3 10.6 11.1 2 # 3 F 9.5 13.1 13.8 2 # 4 M 11.5 13.4 12.9 1
Another way to do it is to use the match
function:
oldvalues <- c("M", "F") newvalues <- factor(c("g1","g2")) # Make this a factor data$scode <- newvalues[ match(data$sex, oldvalues) ] # subject sex control cond1 cond2 scode # 1 M 7.9 12.3 10.7 g1 # 2 F 6.3 10.6 11.1 g2 # 3 F 9.5 13.1 13.8 g2 # 4 M 11.5 13.4 12.9 g1
If, instead of creating a new column, you just want to rename the levels from "M" and "F" to something else, see ../Renaming levels of a factor.
Recoding a continuous variable into categorical variable
Mark those whose control measurement is <7 as "low", and those with >=7 as "high":
data$category[data$control< 7] <- "low" data$category[data$control>=7] <- "high" # Convert the column to a factor data$category <- factor(data$category) # subject sex control cond1 cond2 scode category # 1 M 7.9 12.3 10.7 g1 high # 2 F 6.3 10.6 11.1 g2 low # 3 F 9.5 13.1 13.8 g2 high # 4 M 11.5 13.4 12.9 g1 high
With the cut
function, you specify boundaries and the resulting values:
data$category <- cut(data$control, breaks=c(-Inf, 7, 9, Inf), labels=c("low","medium","high")) # subject sex control cond1 cond2 scode category # 1 M 7.9 12.3 10.7 g1 middle # 2 F 6.3 10.6 11.1 g2 low # 3 F 9.5 13.1 13.8 g2 high # 4 M 11.5 13.4 12.9 g1 high
By default, the ranges are open on the left, and closed on the right, as in (7,9]. To set it so that ranges are closed on the left and open on the right, like [7,9), use right=FALSE
.
Calculating a new continuous variable
Suppose you want to add a new column with the sum of the three measurements.
data$total <- data$control + data$cond1 + data$cond2 # subject sex control cond1 cond2 scode category total # 1 M 7.9 12.3 10.7 g1 middle 30.9 # 2 F 6.3 10.6 11.1 g2 low 28.0 # 3 F 9.5 13.1 13.8 g2 high 36.4 # 4 M 11.5 13.4 12.9 g1 high 37.8